- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33.2k
gh-104400: Add more tests to pygettext #108173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @tomasr8 Thanks for opening the new PR! Could you look at the failing tests please? A | 
| hmm looks like an encoding issue. Hopefully, this'll fix it. | 
| Looks like the same three tests failed. Do you have access to a Windows computer? If not I should be able to have a look later on. A | 
| No worries! luckily I have a windows machine lying around 😅 The problem is that there's no way to specify the output encoding so it uses the platform default. This makes it difficult to compare the files because the charset is also part of the header.. I'll just normalize it the same way as I do with the creation date. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Some comments on the test code:
Co-authored-by: Adam Turner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
| 
 Thanks for the review! | 
| cc. @warsaw who asked for a ping on Discourse :) | 
| """Tests to cover the Tools/i18n package""" | ||
|  | ||
| import os | ||
| from pathlib import Path | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to use pathlib? Other tests simply use os.path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were only a handful of uses of os so I went ahead and replaced them with pathlib which I think improves readability, but I'm happy to revert the change if you prefer to keep os :)
| I also added a  | 
        
          
                Lib/test/test_tools/test_i18n.py
              
                Outdated
          
        
      | def update_POT_snapshots(): | ||
| for input_file in DATA_DIR.glob('*.py'): | ||
| output_file = input_file.with_suffix('.pot') | ||
| contents = input_file.read_text(encoding='utf-8') | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have some files with non-UTF-8 encoding.
Since contents is only used to copy a file, you can read/write the binary content.
        
          
                Lib/test/test_tools/test_i18n.py
              
                Outdated
          
        
      | with temp_cwd(None): | ||
| Path(input_file.name).write_text(contents) | ||
| assert_python_ok(Test_pygettext.script, '--docstrings', input_file.name) | ||
| output = Path('messages.pot').read_text() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you read text, always specify the encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes problems on Windows, where the encoding is cp1252 so reading it back as utf8 fails. I don't know how else to get around this besides forcing pygettext to always output utf8 (or adding a configurable parameter). Do you have any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use -Xutf8 or PYTHONIOENCODING=utf-8 to run pygettext, because the text can be non-encodable with the locale encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 👍
| Thanks @tomasr8 for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13. | 
(cherry picked from commit dcae5cd) Co-authored-by: Tomas R. <[email protected]>
(cherry picked from commit dcae5cd) Co-authored-by: Tomas R. <[email protected]>
| GH-126361 is a backport of this pull request to the 3.13 branch. | 
| GH-126362 is a backport of this pull request to the 3.12 branch. | 
(cherry picked from commit dcae5cd) Co-authored-by: Tomas R <[email protected]>
(cherry picked from commit dcae5cd) Co-authored-by: Tomas R <[email protected]>
As suggested here, I'm splitting the PR which updates pygettext into multiple smaller PRs. This first one just adds more comprehensive tests to pygettext without changing any functionality.
The test cases I added test message & docstring extraction which were not covered before. I also added a specific test for file locations.
Instead of having a test case for each possibility, I group the test cases into files on which I run the script and compare the entire script output. This has several advantages:
msgidwhich is insufficient.While writing the tests, I actually found more bugs than in the original PR. I documented these in the tests - I don't think it's worth fixing these now as most of these will be fixed automatically once pygettext is converted to use the AST.